-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate performance tests to Playwright #51084
Conversation
Size Change: +40.4 kB (+3%) Total Size: 1.44 MB
ℹ️ View Unchanged
|
Flaky tests detected in e77d0b7. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/5345949860
|
e9e0177
to
2f163a3
Compare
3e61312
to
4bf1c77
Compare
28e5037
to
d9f2672
Compare
I've prepared a Playwright vs. Puppeteer metrics comparison by running all the specs against both frameworks in a single CI job. Here are the results: post-editor
site-editor
front-end-classic-theme
front-end-block-theme
The biggest difference seems to be around the initial painting area (FP, FCP, LCP), which might be caused by the initialization of the custom e2e utils for each page, as mentioned by @kevin940726 in #51084 (comment). I'd like to try going vanilla Playwright to see if this is true, but I'd leave it for a follow-up PR. For this PR, I'd leave Puppeteer running perf specs as they are and only merge the migrated ones. Once we agree that the numbers are 👌, we do the switch. How does that sound, @youknowriad @oandregal @dmsnell? |
@WunderBart thanks again for all the work you've put into this. I'm afraid I don't have much of value to offer, other than I don't see any reason this shouldn't be done. it's obvious the metrics will change; which I figure is the tradeoff you see as valuable enough to accept in order to eliminate Puppeteer from the repo. That's something you have more context about than I do, so it sounds fine enough. I'll add my approval which is just that: me voicing my thoughts and nothing more. Please lean on whatever other review you see fitting, as well as your own vetting and testing. |
There is some confusion at the moment with existing discrepancies in testing metrics. After speaking with @youknowriad it seems like this could be a poor time to change them even more dramatically until we all know what happened and how to fix consistency. For background, it appears like changing the base version against which performance numbers are compared has skewed the measurements, or led to reporting slower performance when there's suspicion that it isn't slower. All this is to say I pulled my approval back because we need to figure these things out first. @WunderBart have you considered the impact on release performance reporting? Have you thought about running the tests against old versions to establish new baselines for comparing performance? |
Possibly this was the reason for the skewed metrics? #51407 (comment)
Yes, although for this PR, I only wanted to do the 1:1 migration (no improvements, do with Playwright the same thing we do with Puppeteer) and merge without switching to Playwright. In the follow-up PRs, I wanted to address the aforementioned inconsistencies and improvements and collect all the measurements necessary for the switch. Right now, the 1:1 migration is done, and the code is working. Merging this PR would not affect the current performance job at all. |
Thanks for all the work on this PR. Just want to make sure that we're still producing the same files as output so we continue to log things properly. Other than that I'm fine merging this. |
I think after a dozen commits, we'll get a clear idea about whether this is more stable or less stable than before. |
edit: I accidentally deleted the comment instead of just editing it, so reposting now. Sorry for the noise! Below are the results of 100 rounds of the Puppeteer vs. Playwright perf comparison job for the post editor suite. A hundred rounds were way too much for CI (which is how I learned about the 6h job timeout), so I decided to run it locally overnight. Click here for the source sheet. Regarding the results, one thing that can be noticed immediately is that Puppeteer shows more stability in the load aspect, while Playwright is definitely better with the UI. Some metrics also show significant yet regular bumps ( I'd like to move forward with this PR (uninvasively), so please let me know what you think, folks! 🙌 /cc @dmsnell @youknowriad @oandregal @kevin940726 Loading performance
UI performance
|
I defer to @youknowriad and the aforementioned issue with the base version (which I think might already be resolved) - as far as my review is concerned this seems fine to merge. |
Cool! This PR's code is totally passive - it doesn't change anything in the performance metrics area, just adds the migrated tests so that we have a starting point. |
As per #51084 (comment) and #51084 (comment), this one's safe to merge as it doesn't yet switch to Playwright. The remaining suggestions will be addressed in a follow-up PR so there's a clear distinction between the migration and enhancements. /cc @kevin940726 |
What?
This PR migrates existing performance tests from Puppeteer to Playwright. The aim is to stay as close to the original tests as possible to prevent skewing the numbers but, at the same time, improve the overall stability, which Playwright has proven to be good at. Any inconsistencies or improvements (e.g., #51379 #51383 #51376) should be addressed in a follow-up PR.
Metrics
The following compares before and after-migration performance metrics (average values, as calculated in the reporter script). Both jobs were run locally so that the milliseconds might differ from the ones in CI. The timings are not so important, though - the % change is.
edit: See the long-running comparison in #51084 (comment)
Classic Theme Spec
Block Theme Spec
Post Editor Spec
Site Editor Spec
Reporters
There is a slight difference in reporting, where Playwright lists each test case vs. only the top-level suite.
Execution Speed
Testing locally, Playwright's execution is, on average, 20% faster than Puppeteer's (~4 vs. ~5 minutes per job). The following factors most likely cause this:
RequestUtils
that perform backend requests to, e.g., switch theme instead of clicking through the UI, andAuto-waiting
, which automatically releases the element as actionable as soon as possible.I've also noticed, while testing headfully, that Playwright is generally executing faster than Puppeteer. I'm unsure if we're missing some throttling config, but the metrics seem to stay consistent after the migration, so I wouldn't worry about that much, I guess.
Browsers
The frameworks provided the following browsers (running on M1 Pro, Ventura 13.1):